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duce feedback to the letter level, reinforcing letter sequences which spell 
words. The model can account for the basic findings on the perception of pro- 
no«nceable nonwords as well as words. The account is based on the idea that 
pseudDWords can also activate i ipresentations o£ words, even though they do not 
match any word perfectly. As with word displays, feedback from the activated 
words reinforces the letters presented, thereby increasing their perceptibility. 
The model also accounts for the role of masking in determining the magnitude ot 
the various effects, the fact that expectations Influence perception of letters 
in words, and for the fact that effects of contextual constraint and letter 
cluster frequency are obtained under some conditions and not others. 
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McClelland & Rumelhart 



As we perceive, we are continually extracting sensory information to 
guide our attempts to determine what is before us. In addition, we bring to 
perception a wealth of knowledge about the objects we might see or hear and 
the larger units in which these objects co-occur. As one of us has argued for 
the case of reading (Rumelhart. 1977) our knowledge of the objects we might be 
perceiving works together with the sensory information in the perceptual pro- 
cess. Exactly how does the knowledge which we have inte-act with the input? 
And, how does this interaction facilitate perception? 

In this two-part article we have attempted to take a few steps toward 
answering these questions. We consider one specific example of the interac- 
tion between knowledge and perception — the perception of letters in words 
and other contexts. In Part I we examine the main findings in the literature 
on perception of letters in context, and develop a rr.odel called the interac- 
tive activation model to account for these effects. In Part II (Rumelhart & 
McClelland, forthcoming) we extend the model in several ways. We present a 
set of studies introducing a new technique for studying the perception of 
letters in context, independently varying the duration and timing or the con- 
text and target letters. We show how the model fares in accounting for the 
results of these experiments and discuss how the model may be extended to an 
account of the pronunciation of nonwords. We also explore the influence of 
higher-level (semantic and syntactic) inputs to i;he perceptual process, not 
only for the case of visual word perception but for the perception of speech 
as well. Finally, we consider how the mechanisms developed in the course of 
exploring our model of perception raight be used in other sorts of processes, 
such as categorization, memory search, and retrieval. 
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Basic Findings oii the Role of Context in Perception of Letters 

The notion that knowledge and familiarity play a role in perception has 
often been supported by experiments on the perception of . -.ters in words or 

^ word^like lettdK strings (Bruner,. 1957; Neisser, 1967). It has been known for 
nearly 100 years that it is possible to identify letters in words more accu- 
rately than letters in random letter sequences under tachistoscopic presenta- 
tion conditions ^Cattell, 1886; see Huey, 1908, and Neisser, 1967 for 
reviews). However, until recently such effects were obtained using whole 
reports of all of the letters presented. These reports are subject to guess- 
ing biases, so that it was possible to imagine that familiarity^did not deter- 
mine how much was seen but only how much could be inferred from a fragmentary 
percept. In addition, for longer stimuli, full reports are subject to forget- 
ting. We may see more letters than we can actually report in the casr of non- 
words, but when the. letters form a word we may be able to retain the item as a 
single unit whose spelling may simply be read out from long-term memory. 
Thus, despite strong arguments to the contrary by proponents of the view that 
familiar context really did influence perception, it has been possible until 
recently to imagine that the context in which a letter was presented only 
influenced the accuracy of post-perceptual processes, and not the process of 

perception itself. 

The perceptual advantage of letters in words. The seminal experiment of 
Reicher (1969) seems to suggest that context does actually influence percep- 
tual processing. Reicher presented target letters in words, unpronounceable 
nonwords, and alone, following the presentation of the target display with a 
presentation of a patterned mask. The subject was then tested on a single 
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letter invtlje display, using a forced choice between two alternative letters. 
Both alternatives fit the context to fortn an item of the type presented, so 
that, for eximple, in the case of a word presentation, the alternative would 
also form a word in the context. 

Forced choice performance was more accurate for letters in words than for 
letters in nonwords or even for single letters. Since ^both alternatives made 
a word with the context,, it is not possible to argue that the effect is due to 
post-perceptual guessing based on equivalent information extracted about the 
target letter in the different conditions. It appears that subjects actually 
come away with more information relevant to a choice between the alternatives 
•when the target letter is a part of a word. And, since one of the control 
conditions was a Single letter, it is not reasonable to argue that the effect 
is due to forgetting letters that have been perceived. It is hard to see- how 
a single letter, once perceived, could be subject to a greater forgetting than 
a letter in a word. 

Reicher's finding seems to suggest that perception of a letter can be 
facilitated by presenting it in the context of a word. It appears, then, that 
our knowledge about words can influence the process of perception. 

Our model presents a way of bringing such knowledge to bear. The basic 
idea is that the presentation of a string of letters results in partial 
activation of representations of letters consistent with the visual input. 
These activations in turn produce partial activations of representations of 
words consistent with the letters, if there are any. The activated represen- 
tations of words then produce feedback which serves to reinforce tho aotiv^i- 
tions of the representations of letters. As a result, letters in words nre 
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ra<ire perceptible, .because they re's^ive more activation than representations of 

■■ . ' , • • • 

either single lettfers or letters in unrelated context. * 

Reicher's basic finding has been investigated and extended in a large 

number of studies ♦ and there now appear:, to be a set of important related 

a 

findings that must also be explained. Here follows a brief discussion of 
several fyr.thep results which seem to be bdth basic and well establ^ished. 

Irrelevance ^of word shape . The perceptual advantage for letters in- words 
does not depend on presenting words in visually distinctive, or even familiar, 
forms. Typically, the effects are obtained using words ^ typed in all Xjpper 
case type, which minimizes conf igurational aspects of words as visual forms. 
In addition,' the word advantage over nonwords can be obtained using stimbj^li 
presented in mixed upper and lower case type (Adams, 1979; McClelland, 1976). 
Although performance is affected by mixing upper and lower case letters in the 
same string, the disruption is of about the same magnitude for letters in non- 
words as. it is for letters in words, as' long as both types of items are tested 
at comparable performance levels (Adams, 1979). It is therefore clear that 
the- word advantage depends on presenting tlv^ target letter in the context of 
an item which together with the targe': forms a familiar arrangement of 
letters, independent of its actual visual form. * 

Dependence on masking . The word' advantage over single letters and non- 
words appears to depend upon the visual conditions used (Johnston & McClel- 
land, 1973; Massaro 4 Klitzke, 1979; see also Juola, Leavitt & Choe, 197^; and 
Taylor & Chabot, 1978). The word advantage is quite large when the target 
appears in a distinct, high-contrast display followea by a patterned mask of 
similar characteristics. However, the word advantage over single letters is 
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' actually reversed, and the word advantage over nonwords becomes quite small 
the target is indistinct, low in contrast and followed by a blank, non- 
patterned fie]d. Recently, -it has also. been shown that the word advantage 
over single litters, is grfeatly reduced if the patterned mask contains letters 

instead of nonletter patterns X Johnston & McClelland, in press; Taylor & Cha- 

■ * / 

bot, -1978). ■ • ., 

ExtensidTi to pronounceable nonwords. The word advantage also applies to 

— 

pronounceable nonw9rds, such as REET or MAVE* A large number of studies 
(Aderman & Smith,' 1971; Bardn & Thurston, 1973; Carr , David^n & Hawkins, 
1978; Spoehr & Smithy 1975) have shown that letters in pronounceable nonwords 
(also called pseudowords) have a large advanta^ge over letters in unpronounce- 
able nonwords (also called unrelated letter strings), and three studies* (Carr , 
et al. 1978; Massaro & Kl^ltzke, 1979; McClelland i Johnston. 1977) have 
obtained an advantage for letter^ in pseudowords ov^r single letters, ^ , 

It now appears that the pseudoword advantage depends on ^ the subjects' 
expectations (Aderman & Smith. 1'971 ; Carr. et al, 1978). Carr. et al (1978) 
found that if subjects are under the impresSiipn that pseudowords might be 
shown, performance on pseudowords is almost as accurate ad^ per formance on 
letters in words* But if they do not expect any pseudowords. performance on 
these items is not much better than performance on unpronounceable nonwords. 
Interestingly. Carr. et al (1978) found that the word advantage did not depend 
on expectations. There was a sizable advantage for letters in words over 
letters in unrelated context whether the subject expected words or on.ly unre- 
lated letter strings. 
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Another impbrtant fact about performance on pseudowords is that differ- 
ences in lette- cluster frequency do not appear to influence accuracy of. per- 
ception of letters in either words or pseudowords (McClelland & Johnston,. 
1977). ' . 

Absence of constraint effects. One important finding which rules out' 
several of the models which have b^en proposed previously' is the finding that 
letters in highly constraining word contexts have little or no advantage over 
letters in weakly constraining contexts under the distinct target/patterned 
mask conditions which produce a large word advantage (Johnston, 1976*. see also 
Estes, 1975). For example, if the set of possible stimuli contains only 
words, the context _HIP constrains the first letter to be either an S, a C, or 
a W, whereas the context _INK is compatible with ,12 to 14 letters (the exact 
number depends on what counts as a word). We might expect that the former, 
more strongly constraining context, would produce superior detection of a tar- 
get letter, but, in a very carefully controlled and executed study, Johnston 
(1978) found a non-significant effect in the reverse direction. Although 
there are some findings suggesting that constraints do influence performance 
under other conditions, they do not appear to make a difference under the dis- 
tinct target/patterned mask conditions of the Johnston study. 

To be successful, any model of word perception must provide an account 
not only for Reicher's basic effect, but for the separate and joint effects 
(or lack thereof) due to visual conditions, stimulus structure, expectations, 
and constraints on the perception of letters in context. Our model provides 
an account for all of these effects. We begin by presenting the model in 
abstract form, then focus in on the details of the model, and present an 
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example of the working of the model in a hypothetical experimental tri&l. 
Subsequently, we turn to a detailed consideration of the findings discussed in 
this section. In the^^^nal section of Part I, we also consider a few other 
facts .about th^-^pl^eption of letters in context and ^^uggest how our model 

might be extended to accS'-afit for these effects as well. 

\ 

The Interactive Activation Model 

We approach the phenomena of word perception with a number of basic 
assumptions;, which we want to incorporate into the model. First, we assume 
that visual perception *akes place within a system in which there are several 
levels, of processing, each concerned with forming a representation of the 
input at a different level of abstraction. For visual word perception, we 
assume that there is a visual feature level, a letter level, and a word level, 
as well as higher le. els of processing which provide "top-down" input to the 
word level* 

Second, we assume that visual perception involves parallel processing. 
There are two different senses in which we view perception as parallel. We 
assume that visual perception is spatially parallel. That is, we assume that 
information covering a region in space at least large enough to contain a 
four-letter word is processed simultaneously. In addition, we assume that 
visual processing occurs at sev-ral levels at the same time. Thus, our .•node- 
of word perception is spatially parallel, (i.e. capable of processing several 
letters of a word at one time) and involves processes which operate simultane- 
ously at several different levels. Thus, for example, proces.sing at tne 
letter level presumably occurs simultaneously with processing at the word 
level, and with processing at the feature level. 
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Thirdlv, we assume that perception Is fundamentally an interactive pro- 
cess. That is, we assume that "top-down" or "conceptually driven" processing 
works simultaneously and in conjunction with "bottom-up" or "data driven" pro- 
cessing to provide a sort of multiplicity of constraints which jointly deter- 
mine what we perceive. Thus, for example, we assume that knowledge about the 
words of the language interacts with the incoming featural information in co- 
determining the nature and time course of the perception of the letters in the 
word . 

Finally, we wish to implement these assumptions using a relatively simple 
method of interaction between sources of knowledge whose only "currency" is 
simple "excitatory" and "inhibitory" activations of a neural type. 

Figure 1 shows the general conception of the model. Perception is 
assumed to consist of a set of interacting levels, each level communicating 
with several others. Communication proceeds through a spreading activation 
mechanism in which activation at one level "spreads" to neighboring levels. 
The communication can consist of both excitatory and inhibitory messages. 
Excitatory messages increase the activation level of their recipients. Inhi- 
bitory messages decrease the activation level of their recipients. The arrows 
in the diagram represent excitatory connections and the circular ends of the 
connections represent inhibitory connections. The intra-level inhibitory loop 
represents a kind of lateral inhibition in which incompart^ble units at the 
same level compete. For example, since a string of, say, f©ur letters can be 
interpreted as at most one four-letter word, the various possible words mutu- 
ally inhibit one another and in that way compete as possible interpretations 
of the string. 



McClelland 4 Rumelhart 

9 



Interactive Aotlvatlon Model 
Part I 



McClelland & Rumelhart 

10 



HI6HEr» LEVEL INPUT 




VISUAL INPUT 



ACOUSTIC INPUT 



and 



Figure 1. A sketch of some of 
auditory word perception, with 



the processing levels 
interconnections . 



involved in visual 



ERIC 



Interactive Activation Model McClelland 4 Rumelhart 

Part I 1 ^ 

It is clear that there are many levels which are important in reading and 
perception in general and the interactions among these levels are Important 
for many phenomena. However, a theoretical analysis of all of these interac- 
tions introduces an order of complexity which obscures comprehension. For 
this reason, we have restricted the present analysis to an examination of the 
interaction between a single pair of levels, the word and letter levels. We 
have found that we can account for the phenomena reviewed above by considering 
only the interactions between letter level and word level elements. There- 
fore, for the present we have elaborated the model only on these two levels, 
as illustrated in Figure 2. We have delayed consideration of the effects of 
higher-level processes and/or phonological processes, and we have ignored the 
reciprocity of activation which may occur between word and letter levels and 
any other levels of the system. We consider aspects of the fuller model 
including ;these influences in Part II. 

Specific Assumptions 

Representation assumptions . For every relevant uii^it in the system we 
assume there is an entity called a node . We assume that there is a node for 
each word we know, and that there is a node for each letter in each position. 

The nodes are organized into levels. There are word level nodes, and 
letter level nodes. Each node has connections *-o a number of other nodes. 
The set of nodes to which a node connects are called its neighbors . Each con- 
nection is two way. There are two kinds of connections: excitatory and inhi- 
bitory, if the two nodes suggest each other's existence (in the way that the 
node for the word the' suggests the node for an initial 't' and vice versa) 
then the connections are excitatory. If the two nodes are inconsistent with 

I'o 
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Figure 2. The simplified processing system considered in Part I. 
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one another (in the way that the node for the word 'the* and the node for the 
word 'boy' are inconsistent) then the relationship is inhibitory. (Note lihat 
we identify nodes by the units they detect, placing them in quotes: Stimuli 
presented to the system are typed in uppercase letters). 

Connections may occur within levels or between adjacent levels. There 
are no connect..ons between non-adjacent levels. Connections within the word 
level are mutually inhibitory since only one word can occur at any one place 
at any one time. Connections between the word level and letter level may be 
either inhibitory or excitatory (depending on whether or not the letter is a 
part of the word in the appropriate letter position). We call the set of 
nodes with excitatory connections to a given node its excitatory neighbors . 
We call the set of nodes with inhibitory connections to a given node its inhi- 
bitory neighbors . 

A subset of the neighbors of the letter 't' are illustrated in Figure 3. 
Again, excitatory connections are represented by arrows ending with points and 
inhibitory connections are represented by arrows ending with dots. We 
emphasize that this is a small subset of the neighborhood of the initial 't'. 
The picture of the whole neighborhood, including all the connections among 
neighbors and their connections to their neighbors, is rrjch too complicated to 
present in a two-dimensional figure. 

Activation assumptions . There is, associated with each node, a momentary 
level of activation. This level of activation is a real number, and for node 
i we will represent it by a^(t). Any node with a positive degree of activa- 
tion is said to be active . In the absence of inputs from its neighbors, all 
nodes are assumed to decay back to an inactive state; that is, to an 
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Figure 3. A few of the neighbors of the node for the letter 't' in the 
first position in a word, and their interconnections. 
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activation value at or below zero* This resting level may differ from node to 
node, and' corresponds to a kind of a priori bias (Broadbent, 1967 ), determined 
by frequency of activation of the node over the long term* Thus, for example, 
the . nodes for high frequency words have resting levels higher than those for 
low frequency words. In any case, the resting level for node i is represented 

by For units not at rest, decay back to the resting level occurs at some 

rate 0^^ 

When th^ neighbors of a node are active they influence the activation of 
the node by either excitation or inhibition, depending on their relation to 
the node. These excitatory and inhibitory influences combine by a simple 
weighted average to yield a net input to the unit, which may be either excita- 
tory (greater than zero) or inhibitory. In mathematical notation, if we let 

"j^(t) represent^ the net input to the unit, we can write the equation for its 
value as 

ni(t) = liclijejCt) - EXikikCt), 

where the ®j(t)s are the activations of the active excitatory neighbors of the 
node, the ij^(t)s are the activations of the active inhibitory neighbors of the 
node, and the ^^^s and V^^^s are associated weight constants* Inactive nodes 
have no influence on their neighbors* Only nodes in an active state have any 
effects, either excitatory or inhibitory. 

The net input to a node drives the activation of the node up or down 
depending on whether it is positive or negative. The degree of the effect of 
the input on the node is modulated by the node^s current activity level, to 
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keep the input to the node from driving it beyond some maximum and minimum 

values (Grossberg, 1978). When the net input is excitatory (n^(t)>0), the 
effect on t*ie node is given by 



<i(t) = ni(t)(M - a^Ct)) . ^2) 



where M is the maximum activation level of the unit. The modulation has the 

t 

desired effect because as the activation of the unit approaches the maximum* 
the effect of the inpui" is reduced to zero. 

In the case where the input is inhibitory (n^(t)<0), the effect of the 
input on the node is given by 



<i(t) = ni(t)(ai(t) - m) . • (3) 



where m is the minimum activation of the unit. 

The new value of the activation of a node at time t+6t is equal to the 
value at time t, minus the decay, plus the influence of its neighbors at time 
t: 



a^Ct+St) = a.(t) ~ &i(ai(t) - r^) + 4^(t) . 



In^ut assumptions . Upon presentation of a stimulus a set of featural 
inputs are assumed to be made available to the system. During each moment in 
time each feature has some probability p of being detected. Upon being 
detected, the feature begins sending activation to all letter level nodes 



Interactive Activation Model 
Part I 



McClelland 4 Rumelhart 

17 



which contain that feature. All letter level nodes which do not contain the 
extracted feature are inhibited. The probability of detection and the rate at 



which the feature excites or inhibits the relevant letter nodes are assumed to 
def>end on the clarity of the visual display. It is assumed that features are 
binary and that we can extract either the presence or absence of a particular 
feature. So, for example, when viewing the letter R we can extract among 
other features the presence of a diagonal line segment in the lower right 
corner and. the absence of a horizontal line across the bottom. 

Presentation of a new display following an old one results in the proba- 
bilistic extraction of the set of features present in the new display. These 
features, when extracted replace the old ones in corresponding positions. 
Thus, the presentation of an 0 following the R described above would result in 
the replacement of the two features described above with their opposites. 

The Operation of the Model 

Now, consider what happens when an infMJt reaches the system. Assume that 

at time tQ an prior inputs have had &n opportunity to decay, so that the 
entire system is in its quiescent state and each node is at its resting level. 
The presentation of a stimulus initiates a chain in which certain features are 
extracted and excitatory and inhibitory pressures begin to act upon the letter 
level nodes. The activation levels of certain letter nodes are pushed above 
their resting levels. Others receive predominately inhibitory inputs and are 
pushed below their resting levels. These letter nodes, i,n turn, begin to send 
activation to those word level nodes they are consistent with and Inhibit 
those word nodes they are not consistent with. In addition, the various 
letter level nodes attempt to suppress each other with the strongest ones 
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getting the upper hand. As word level nodes become active they in turn com- 
pete with one another and send excitation and inhibition back down to the 
letter level i.od'-s. If the input features were close to those for one partic- 
ular set of letters and those letters were consistent with those forming a 
particular word, the positive feedback in the system will work to rapidly con- 
verge on the appropriate set of letters, and the appropriate word. If not, 
they will compete with each other and perhaps no single set of letters or sin- 
gle word will get enough activation to dominate the others and their inhibi- 
tory relationships might strangle each other. The exact details of this pro- 
cess depend on the values of the various parameters of the model in ways which 
we will explore as we proceed^ 

Simulations 

In the following example, as in the remainder of the paper, we illustrate 
the properties of the model with computer simulations. For purposes of these 
simulations we have made a number of other simplifying assumptions. These 
additional assumptions fall into four classes: 



(1) discrete rather than continuous time, 

(2) simplified feature analysis of the input font, 

(3) restrictions of the parameter space, and 

(4) a limited lexicon. 

The simulation of the model operates in discrete time slices or ticks, 
updating the activations of all of the nodes in the system once each cycle on 
the basis of the values on the previous cycle. Obviously, this is simply a 
matter of computational convenience, and not a fundamental assumption. We 
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have endeavored to keep the time slices '*thin" enough so that the model ^s 
behavior is continuous for all intents and purposes. 

Any simulation of the model involves making explicit assumptions about 
the appropriate featural analysis of the input font. We have» for simplicity,* 
chosen the font and featural analysis employed by Rumelhart (1971) and by 
Rumelhart and Siple (197^) and illustrated in Figure ^. Although the experi- 
ments we have simulated employed different type fonts, presumably the basic 
results do not depend on the particular font used. The simplicity of the 
present analysis recommends it for the simulatit^ns. 

We have endeavot ed to find a single set of parameter values for our model 
which would allow us to account for all of the basic^^^Mjidings reviewed above. 
In order to keep the search space to an absolute minimurts,,^^^^^ have adopted 
various restrictive simplifications. We have assumed that the weight parame- 
ters, d^^ and X^j depend only on the levels of nodes i and j and on no other 
characteristics of their identity. This means, among other things, that the 
excitatory connections between all letter nodes and all of the relevant word 
nodes are equally strong, independent of the identity of the words. ' -Thus, for 
example, the degree to which the node for an initial 't' excites the node for 
the word *tock^ is exactly the same as the degree to which it excites the node, 
for a word like 'this,' in spite of a substantial difference in frequency of 
usage. To further simplify matters, two types of influences have been set to 
zero, namely the word to letter inhibition and the letter to letter 'inhibi- 
tion. We have also assigned the same resting value to all of the letter 
nodes, simply giving each node the value of zero. The resting value of nodes 
at the word level has been set to a value between -.05 and 0, depending on 
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JKLMNDPQR 

5TU/WXYZ 

♦ 



Figure 4. The features used to construct the letters in the font assumed 
by the simulation program, and the letters themselves (from Humelhart & Siple, 
1974). 
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word frequency. The values of the remaining parameters have been fixed at the 
values given in Table 1. In the simulations which follow, all parameters are 
fixed at the values indicated in the table. The table also includes a brief 
statement of the significance or rationale for the particular value assigned. 
In some cases, fuller discussions are warranted, and are given in the context 
of a discussion of the model's behavior in accounting for one effect or 
another. 

In order to account for the dependence of the phenomena of letter percep- 
tion on visual conditions and expectations, it is necessary to assume that 
some parameters depend on these factors. The quality of the visual display is 
assumed to influence the system in two ways. First of ail, it may not be pos- 
sible for- the visual system to extract all the features of the display if it 
becomes too degraded. To capture this possibility, we allow the probability 
of feature extraction to vary with the quality of the display. Once the qual- 
ity is sufficiently good for per feet 'feature extraction, the strength of the 
effect exerted by the features is assumed to depend on such things as the 
brightness, contrast, size, and retinal position of the display. The parame- 
ters which reflect the differential strength of the effect of the input are 
the feature to letter excitation parameters. It is assumed that these parame* 
ters increase and decrease together as visual quality increases or decreases, 
but stay in the same ratio. To accommodate the fact that performance depends 
in some conditions on the subjects' expectations, we have found it sufficient 
to assume that one of the internal parameters of the model is under subject 
control. As we shall see below, we are able to provide a straightforward 
account of the effects of expectations about whether pronounceable nonwords 
will be shown if we assume that subjects have control over the strength of the 
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Table 1 

Parameter Values Used in the Simulations 
Value Remarks 



Basic node characteristics 
decay rate .07 



maximum activation 
minimum activation 



Resting levels 
letter level ^ 
word level 

Input 

p of feat detection 
feat-let excitation 
feat-let inhibition 
E/I ratio 



Letter-word influences 
• excitation 
inhibition 



Scales time* Low value ensures adequacy 
of approximation of continuity. 
1.00 Scales activations. 
.20 Small negative value allows rapid re- 
activation of inhibited units • 



0 Simplifying assumption. 
<0 Depends on frequency • (range: 0 to -•OS) 



war. Depends on visual conditions. 
war. Depends on visual conditions, 
var. Inh"tbition much stronger than excitation so 
1/30 that one feature incompatible with a letter 
results in net bottom-up inhibition. 



.07 

.04 Low value allows letter level to excite words 

or with some letters incompatible with input. 

.21 High value prohibits these activations. 



Within-level inhibitior) 
word level .21 

letter level 0 



Large inhibitory interactions allow correct 
word to dominate total activity at word level. 
Simplifying assumption. Unnecessary because of 
strong inhibition from inappropriate features. 



Wor d-i etter f e edback 

excitation .30 
inhibition 0 



Simplifying assumption. 



Output 

integration rate 



.05 Low rate levs units be quickly activated 

then inhibited without becoming accessible. 



Output Exponentiation 
letter level 10 
word level 20 



Scales relation of activation to p(correct). 
Larger value required to offset greater 
number of alternatives. 



5g 
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letter to word inhibition parameter. We will see why this is so below. In 
any case, the parameters which are assumed to be influenced by visual condi- 
tions or expectations are designated as variable in Table 1. As we go along 
we will explore the effects of variations in these parameters on .the perfor- 
mance of the models ^ 

Finally, our simulations have been restricted to four-letter words. We 
have equipped our simulation program with knowledge 6f U79 four-letter words 
occurring at least 2 times per million in the Kucera and Francis word count 
(1967). Plurals, inflected forms, first names, proper names, acronyms, abbre- 
viations, and occasional unfamiliar entries arising from apparent sampling 
flukes have been excluded. This sample appears to be sufficient to reflect 
the essential characteristics of the language and to show how the statistical 
properties of the language can affect the process of perceiving letters in 
word s . 

An example . For the purposes of this example, imagine that the word WORK 
has been presented to the subject and that the subject has extracted th;?^e 
features shown in Figure 5. In the first three-letter positions the features 
of the letters W, 0 and R have been completely extracted. In the final posi- 
tion a set of features consistent with the letters K and R have been 
extracted, with those features in a portion of the pattern unavailable. We 
wish now to chart the activity of the system resulting from this presentation. 
Figure 6 shows the time course of the activations for selected nodes at the 
word and letter levels respectively. 

At the word level, we have charted the activity levels of the nodes for 
the words 'work', 'word', 'wear' and 'weak'. Note first, that 'work' is the 
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Figure 5. A hypothetical set of features which might be extracted on a 
trial in an exp)eriment on word perception. 
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Figure b. The time course of activations of selected nodes at the word 
and letter levels, after extraction of the features shovm in Figure 5. 
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only word In the lexicon consistent with all the presented information. As a 
result, its activation level is the highest and reaches a value of .8 through 
the first ^0 tlmr cycles. The word *word' is consistent 'ith the bulk of the 
information presented and, as a result, first rises and later, as a result of 
competition with *work* is pushed back down below its resting level. The 
words •wear* and 'weak* are consistent with the information presented in the 
first and fourth letter positions, but inconsistent with the information in 
letter positions 2 and 3. Thus, the activations of these nodes drop to a 
rather low level. This level is not quite as low of course as the activation 
level of words such as *gill* which contain nothing in common with the 
presented information. Although not shown in the figure these words attain 
near-minimum activation levels of about -.20 and stay there as the stimulus 
stays on. Returning to *wear' and *weak', we note that these words are 
equally consistent with the presented information and thus drop together for 
the first 9 or so time units. At this point, however, top-down information 
has determined that the final letter is K and not R. As a result, the word 
•weak' becomes more similar to the pattern at the letter level than the word 
'wear' and, as a result, begins to gain a slight advantage over 'wear.' This 
result occurs in the model because as the word 'work* gains in activation it 
feeds activation bark down to the letter level to strengthen the 'k' over the 
V. The strengthened 'k' continues to feed activation into the wcrd level 
and strengthen consistent words. The words containing 'r' continue to receive 
activation from the words consistent with 'k', and are therefore ultimately 
weakened, as illustrated in the lower panel of the Figure. 

One of the characteristics of the parameter set we have adopted is that 
feature to letter inhibition is 30 times stronger than feature to letter 
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excitation (see Table 1), This ritio ensures that as soon as a feature ijs 

detected which is inconsistent with a particular letter, that letter receives 
•relatively strong net bottom-up inhibition. Thus, in our example, the infor- 
mation extracted clearly disconfirms the possibility that the xv tter D has 
been presented in the fourth position, and thus the activation level of the 
*d'.. node decreases quickly to near its minimum value. However, the bottom-up 
information from the feature level supports both 'k' and 'r' in the fourth 
, position. Thus, the activation level for each of these nodes rises slowly. 
These activation levels, along with those for 'w', 'o' and 'r' push the 
activation level of 'work' above zero and it begins to feed back, and by about 
time cycle 4 it is beginning to push the 'k' above the 'r' (WORR is not a 
word). Note that this separation occurrs just before the words 'weak' and 
•wear' separate. It is this feedback that causes them to separate. Ulti- 
mately, the 'r' reaches a level well below that of 'k' where it remains, and 
the 'k' pushes toward a .8 activation level. Remember that for purposes of 
simplicity the word to letter inhibition and the intra-letter level inhibition 
have both been set to 0. Thus, 'k' and 'r' both co-exist at moderately high 
levels, the 'r' fed only from the bottom-up and the 'k' fed from both bottom- 
up and top-down. 

Although this example is not too realistic in that we assumed that only 
partial information was available in the input for the fourth letter position, 
whereas full information is available at the other letter positions, it does 
illustrate many of the important characteristics of the model. It shows how 
ambiguous sensory information can be disambiguated by top-down processes. 
Here we have a very simple mechanism capable of applying knowledge of words in 
the perception of their component letters. 
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On Making Responses 

One of -he more problematic aspects of a model such as this one Is a 
specification of how these relatively complex patterns of activity might be 
related to the content of percepts and the sorts of response probabilities we 
observe in experiments. We assume that responses and perhaps the contents of 
perceptual experience depend on the temporal integration of the pattern of 
activation over all of the nodes. The integration process is assumed to occur 
slowly enough that brief activations may come and go, without necessarily 
becoming accessible for purposes of responding or entering perceptual experi- 
ence. However, as the activation lasts longer and longer, the probability 
that it will be reportable Increases. Specifically, we think of the integra- 
tion process as taking a running average of the activation of the node aver- 
aged over the inmediately preceding time Interval: 



The parameter r represents the relative weighting given to old and new infor- 



mation. Larger values of r correspond to larger weight for new information. 




(5) 



Response strength in the sense of Luce's choice model (Luce, 1959), is an 
exponential function of the running average activation: 



uj aj^(t) 



(6) 



Si(t) = e 



The parameter uj determines how rapidly response strength grows with increases 
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'm ^activation. Following Luce's forfnulation, we assume that the probability 
of making a response based on node i is given by 



p(Ri,t) = (7) 



where L represents the set of nodes competing at the same level with node i. 

Most of the experiments we will be considering test subject's performance 
on one of the letters in a word, or on one of the letters "in some other type 
of display. In accounting for these results, we have adopted the assumption 
that responding is always based on the output of the letter level, rather than 
the output of the word level or some combination of the two. Thus, with 
regard to the previous example, it is useful *,o look at the "output values" 
for the letter nodes 'r'. 'k' and 'd'. Figure 7 shows the output values for 
these simulations. The output value is the probability that, if a response 
was initiated at time t, the letter in Question would be selected as the out- 
put or response from the system, ^.s intended, these output values grow some- 
what more slowly than the values of the letter activations themselves, but 
eventually come to reflect the activations of the letter nodes, as they reach 
and hold their asymptotic values. 

Comments on Related Formulations 

Before turning to the applications of the model, some comments on the 
relationship of this model to other models extant in the literature is in 
order. We have tried to be synthetic. We have taken ideas from our own pre- 
vious work and from the work of others in the literature. In what follows, we 
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Figure 7. "Output values" for the letters 'r', 'k', and 'd', after 
presentation of the display shown in Figure 5. 
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have attempted to identify the sources of most of the assumptions of the model 
and to show in what ways our model differs from the models we have drawn on. 

First of all, we have adopted the approach of formulating the model in 
terms which are similar to the way in which such. a process might actually be 
carried out in a neural or neural-like system. We do not mean to imply that 
the nodes in our system are necessarily related to the behavior of individual 
neurons. We wiU. however, argue that we have kept the kinds of processing 
involved well within the bounds of capability for simple neural circuits. The 
approach of modeling information processing in a neural-like system has 
recently been advocated by Szentagothai and Arbib (1975), and is embodied in 
many of the papers presented in the forthcoming volume by Hinton and Anderson 
(in press) as well as many of the specific models mentioned below. 

One case in point is the work of Levin ani Eisenstadt (1975) and Levin 
(1976): They have proposed a parallel computational system capable of 
interactive processing which employed only excitation and inhibition as its 
"currency." Although our model could not be implemented exactly in the format 
of their system (called Proteus) it Is claarly in the spirit of their model 
and could readily be implemented within a variant of the Proteus system. 

In a recent paper McClelland (1979) has proposed a cascade model of per- 
ceptual processing in which activations on each level of the system drive 
those at the next higher level of the system. This model has the properties 
that partial outputs are continuously available for processing and that ever> 
level of the system processes the input simultaneously. The present model 
certainly embodies these assumptions. It also generalizes them, permitting 
information to flow in both directions simultaneously. 
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Hinton (1977) has developed a relaxation model for visual perception in 
which multiple constraints interact by means of incrementing and decrementing 
real numbered values associated with various interpretations of a portion of 
the visual scene in an attempt to attain a maximally consistent interpretation 
of the scene. Our model can be considered a sort of relaxation system in 
which activation levels are manipulated to get an optimal interpretation of an 
input word. 

James Anderson and his colleagues (Anderson, 1977; Anderson, Silverstein, 
Ritz, & Jones, 1977) and Kohonen and his colleagues (Kohonen, 1977) have 
developed a sort of pattern recognition system which they call an associative 
memory system. Their system shares a number of commonalities with ours. One 
thing the models share is the scheme of adding and subtracting weighted exci- 
tation values to generate output patterns which represent. cleaned up versions 
of the input patterns. In particular, our d^j and Yj^^ correspond to the 
matrix elements of the associative memory models. Our model differs in that 
it has multiple levels and employs a non-linear cumulation function similar to 
one suggested by Grossberg (1978), as mentioned above. 

Our model also draws on earlier work in the area of word perception, 
iliere is, of course, a strong similarity between this model and the logoRen 
model of Morton ( 1969). What we have implemented might be called a hierarchi- 
cal, non-linear, logogen model with feedback betvreen levels and inhibitory 
interactions among logogens at the same level. We have also added dynamic 
assumptions which are lacking from the logogen model. 
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The notion that word perception takes place in a hierarchical inforraati'on 
processing system has, of course, been advocated by several researchers 
interested in word perception • (Adams, 1979; Estes , 1975; LaBerge & Samuels, 
1974; Johnston & McClelland, in press; McClelland, 1976). Our model differs 
from those proposed in many of these papers in that processing at different 
levels is explicitly assumed to take place in parallel. Many of the models 
are not terribly explicit on this topic, although the notion that partial 
information could be passed along from one level to the next so that process- 
ing could go on at the higher level while it was continuing at the lower level 
had been suggested by McClelland (1976). Our model also differs from all of 
these others, except that of Adams (1979), in assuming that there is feedback 
from the word level to the letter level. The general formulation suggested by 
Adams (1979) is quite similar to our own, although she postulates a different 
sort of mechanism for handling pseudowords (excitatory connections among 
letter nodes) and does not present a detailed model. 

Our mechanism for accounting for the perceptual facilitation of pseudo- 
words involves, as we will see below, the integration of feedback from partial 
activation of a number of different words. The idea that pseudoword percep- 
tion could be accounted for in this way is similar tc the assumptions of 
Glushko (1979) t who suggested that partial activation and synthesis of word 
P'^onunciations could account for the process of constructing a pronunciation 
for a novel pseudoword. 

The feature extraction assumptions an<l the bottom-up portion of the word 
recognition model are nearly the same as those employed by Rumelhart (1970, 
1971 ) and Rumelhart and Siple (1974). The interactive feedback portion of the 
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model is clearly one of the class of models discussed by Rumelhart (1977) and 
could be considered a simplified control structure for expressing the model 
proposed in thtt paper. 

The Word Advantage , and the Effects o£ Visual Conditions 

As we noted previously, word perception has been studied under a vat iety 
of different visual conditions, and it is apparent that diffe ent conditions 
produce different results! The advantage of words over nonwords appears to be 
largest under conditions in which a bright, high-contrast target is followed 
by a 'patterned maslc with similar characteristics. The word advantage appears 
to be considerably smaller when the target presentation is dimmer or otherwise 
degraded and is followed by a blank white field. 

Typical data demonstrating these points (from Johnston & McClelland, 
1973) is presented in Table 2. Forced-choice performance on letters in words 
is compared to performance on letters imbedded in a row of #'s (e.g., READ vs 
*E##). The # s serve as a control for lateral facilitation and/or inhibition. 
(The latter factor appears to be important under dim target/blank mask condi- 
tions) . 

Target durations were adjusted separately for each condition so that it 
is only the pattern of differences within display conditions which is meaning- 
ful. What the data show is that a 15X word advantage *.as obtained in the 
bright target/patterned mask condition, and only a 5% word advantage in the 
dim target/blank mask condition. Massaro and Klitzke (1979) obtained about 
the same ^.Ize effects. Various aspects of these results have also been corro- 
borated in two other studies (Juola, Leavitt & Choe , 1974; Taylor & Chabot, 
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Tai?le 2 

Effect of Display Conditions on 
Probability Correct Forced Choices fn 

Word & Letter Perception, from Johnston & McCl.elland, 1973 

t 

e 

Display Type 

4 

Visual Conditions Word Letter with //'s 

Bright Target/Patterned Mask .80 .65 
Dim Target/Blank Mask .78 .73 
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1978). 

To understand the difference between these two conditions it is important 
to note that in order to get about 75 percent performance in the no-mask con- 
dition, the stimulus must be highly degraded. Since there is no patterned 
mask, the Iconic trace presunably persists considerably beyond the offset ot 
the presentation. The effect of th€ blank mask is simply to reduce the con- 
trast of the icon by sunmating with it. Thus, the limit on performance is not 
so much the amount of tii^e available in which to process the information as it 
is ^he quality of the information made available to the system. In contrast, 
when a patterned mask is employed, the mask interrupts the iconic trace and 
produces spurious inputs which can serve to disrupt the processing. Thus, in 
the bright target/pattern mask conditions, the primary limitation on perfor- 
mance is the time in which the information is available to the system rather ^ 
than the quality of the information presented. This distinf^tlon between the 
way in which blank masks 'and patterned masks interfere with performance has 
previously been made by a number of investigators, including Bumelhart (>970) 
and Turvey* (1973). We now tur^i to con?ider eaoh, of these sorts of conditions 
in turn. 

Word Perception Under Conditions of Degraded Input 

In conditions of degraded (but not abbreviated) input, the role of the 
word level is to selectively reinforce possible letters consistent with the 
visual information extracted which are also consistent with the words in the 
subject's vocabulary. Recall that the task requires the subject to choose 
between two letters which (on word trials) both make a word with the rest cf 
the context. There are two distinct cases to consider. Either the featural 
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Information extracted about the to-be-probed letter is sufficient to distin- 
guish between the alternatives, or it is not. Whenever the featural informa- 
tion is consistent with both of the forced-choice alternatives, any feedback 
will selectively enhance both alternatives, but will not permit the subject to 
improve his ability to distinguish between them. When the information 
extracted is inconsistent with one of the alternatives, there is nothing for 
the model to do if we assume that the subject can actually use the extracted 
feature information directly when it comes time to make the forced choice. 
However, the subject may not have direct access to this information. If we 
assume that forced-choice responses are based not on the feature information 
itself but on the subject's best guess about what letter was actually shown, 
then the model can produce a word advantage. The reason is that feedback from 
the word level will increase the probability of correct choice in those cases 
where the subject extracts information inconsistent with the incorrect alter- 
native, but consistent with a number of other letters. Thus, feedback would 
have the effect of helping the subject select the actual letter shown from 
several possibilities consistent with the set of extracted features. Consider 
again, for example, the case of the presentation of WORD discussed above. In 
this case, the subject extracted incomplete information about the final letter 
consistent with both R and K. ssume that the forced choice the subject was 
to face on this trial was between a D and a K. The account supposes that the 
subject encodes a single letter for each letter position before facing the 
forced choice. Thus, if the features of the final letter had been extracted 
in the absence of any context, the subject would encode R or K equally often 
since both are equally compatible with the features extracted. This would 
leave him with the correct response some of the time. But if he chose R 
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instead, he would enter the forced choice between D and K without knowing the 
correct answer directly. When the whole word display is shown, the feedback 

I 

generated by the processing of all of the letters greatly strengthens the K, 
increasing the probability that it will be chosen over the R, and thus 
increasing the probability that the subject will proceed to the forced choice 
with the correct response in mind. 

Our interpretation of the small word advantage in blank mask conditions 
is a specific version of the early accounts of the word advantage offered by 
Wheeler (1970) and Thompson & Massaro (1973). before it was known that the 
effect depends on masking. Johnston (1978) has argued that this type of 
account does not apply under patterned mask conditions. We are suggesting 
that it does apply to the small word advantage obtained under blank mask con- 
ditions like those of the Johnston and McClelland (1973) experiment. We will 
see below that the model offers a different account of performance under pat- 
terned mask conditions. 

We simulated this interpretation of the small word advantage obtained in 
blank mask conditions in the following way. A set of 40 pairs of four-letter 
words differing by a single letter was prepared. From these words correspond- 
ing control pairs were generated in which the critical letters from the word 
pairs were presented in non-letter contexts (//'s). Because they are presented 
in non-letter contexts, we assune that th^^se letters do not engage the word 
processing system at all. In fact vt^ have run some simulations allowing such 
stimuli to interact with word-level knowledge and it makes littJe difference 
to the overall results. 
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Each member of each pair of Items waa presented to the model h times, 

yielding a total of 320 stimulus presentations of word stimuli and 320 pr^otm- 
tations of single letters. On each presentation, the simulation sampled a 
randan subset of the possible features to be detected by the system. The pro- 
bability of detection of each feature was set at .45. The values of the 
feature to letter excitation and inhibition parameters were set at .005 and 
.15 respectively. As noted previously, these values are In a ratio of 1 to 
30, so that if any one of the fourteen features extracted is inconsistent with 
a particular letter, that letter receives net inhibition from the features, 
and is rapidly driven into an inactive state. 

For simplicity, the features were treated as a constant input which 
remained on while letter and word activations (if any) were allowed to take 
place. At the end of 50 processing cycles, output was sampled. Sampling 
results iti the selection of one letter to fill each position; the selected 
letter is assumed to be the only thing the subject takes away from the target 
display. 

The forced "^hoice is assumed to occur as follows. The subject compares 
the letter selected for the appropriate position against the forced-choice 
alternatives. If the letter selected is one of the alternatives, then that 
alternative is selected. If it is not one of the alternatives, then one of 
the two alternatives is simply picked at random. 

The simulation was run twice, once using the low value of letter to word 
inhibition listed in Table 1 and once using the high value. The results were 
different in the two cases. When the small letter to word inhibition value 
was used the letters embedded in v.ords were 78X correct, whereas those in //'s 
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were 68* correct — a 10% difference. Vftien the larger value of letter to word 
inhibition was used, the two conditions showed no difference. The reason for 
this difference Is as follows. Under conditions In which Incomplete feature 
Information Is extracted from the display, multiple letters become active In 
each position. When the letter to word Inhibition Is strong, these activa- 
tions keep any word from becoming activated. For example, suppose that 'e', 
'o', 'c' and 'q' were all partially activated In the second position after 
presentation of the word READ. Then the activations of 'o', 'c', and 'q' 
would Inhibit the node for 'read', the activations of 'e', 'C and 'q' would 
Inhibit the node for 'road', etc. Other partial activations In other posi- 
tions would have similar effects. Thus, few words ever receive net excitatory 
Input, no feedback Is generated, and little advantage of words over letters 
emerges. When the letter to word Inhibition Is weak, on the other hand, words 
which are consistent with one of the active letters In each position can 
become active, thereby allowing for facilitation by feedback. If, as we have 
assumed, the letter to word Inhibition parameter is under the subject's con- 
trol, then this would be a situation In which It would be advantageou for 
subjects to use a small value of this parameter. Thus, we would s-ssume that 
under conditions of degraded Input subjects would be Inclined to adopt a low 
value of letter to word inhibition, with the effect that partial activation of 
multiple possible letters in each position woula permit the activation of a 
set of possible words. 

Apparently, the low value of letter to word inhibition produced a larger 
effect in the simulation than is observed in experiments. However, there are. 
as Johnston (1978) has pointed out, a number of reasons why an account such as 
the one we have offered would overeocimate the size of the word advantarc^. 
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For one thing, subjects may occasionally be able to retain an impression of 

the actual visual information they have been able to extract. On such occa- 
sions, feedback from the word level will be of no further benefit. Second, 
even if subjects only retain a letter identit> code, they may tend to choose a 
forced-choice alternative which is most similar to the letter encoded, instead 
of simply guessing when the letter encoded is not one of the two choice alter- 
natives. Since the letter encoded will tend to be similar to the letter 
shown, this would tend to result in a greater probability correct and less of 
a chance for feedback to increase accuracy of performance. It is hard to know 
exactly how much these factors should be expected to reduce the sizfe of the 
word advantage under these conditions, but they should reduce it some, bring- 
ing our simulation closely in line with the results. 

Word Percfc.jtion Under Patterned Mask Conditions 

When a high quality display is followed by a patterned mask, we assume 
that the bottleneck in performance does not come in the extraction of feature 
information from the target display. Thus, in our simulation of these condi- 
tions, ^we assume that all of the features presented can be extracted on every 
trial. The limitation on performance comes from the fact that the activations 
produced by the target are subject to disruption and replacement by the mask 
before they can be translated into a permanent form suitable for overt report. 
This general idea was suggested by Johnston and McClelland (1973), and con- 
sidered by a variety of other investigators, including Carr, et al (1978), 
Massaro and Klitzke (1979) and others. On the basis of this idea, a number of 
possible reasons for the advantage for letters in words have been suggested. 
One is that letters in words are for some reason translated more quickly into 
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a nonnnaskable form (Johnston & McClelland, 1973; Massaro & Klitzke, 1979). 
Another is that words activate representations removed from the direct effects 
of visual ^at*-erned masking (Johnston & McClelland, 1973f in press; Carr et 
al. 1978; McClelland, 1976). In the interactive activation model, the reason 
letters in words fare better than letters in nonwords is that they benefit 
from feedback which can either drive then to higher activation levels or which 
can keep them active longer in the face of inhibitory influences of masking, 
or both. In either case, the probability that the activated letter represen- 
tations will be correctly encoded is increased. 

To understand how this account works in detail, consider the following 
example. figure 8 shows the operation of our model for the letter E both in 
an unrelated letter context and in the context of the word READ for a visual 
display of moderately high quality. Ve assume that display conditions are 
sufficient for complete feature extraction, so that only the letters actually 
contained in the target receive net excitatory input on the basis of feature 
information. After some number of cycles have gone by, the mask is presented 
with the same parameters as the target. The mask simply replaces the target 
display / at the feature level, resulting in a completely new input to the 
letter level. This input, because it contains features incompatible with the 
letter shown in all four positions, immediately begins to drive down the 
activations at the letter level. After only a few more cycles, these activa- 
tions drop below resting level in both cases. Note that the correct letter 
was activated briefly, and no competing letter was activated. However, 
because of the sluggishness of the output process, these activations do not 
necessarily result in a high probability of correct re;.ort* As shown in the 
right half of the figure, the probability of correct report reaches a maximum 
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rigure 8. Activation functions (top) and output values (bottom) for the 
letter E, in unrelated context and in the context of the word KEAD. 
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after 16 cycles at a performance level far below the ceiling. 

When the letter is part of a word (in this case, READ), the activation of 
the letters results in rapid activation of one or more words. These words, in 



turn, feed back to the letter level. This results in a higher net activation 
level for the letter embedded in the word. Moreover, since the letter embed- 
ded in a word has feedback from the word level to help sustain its activation, 
it is less readily displaced by the mask. This effect is not visible in the 
Figure. However, as the input strength is increased and the activations begin 
to level off, the difference between these two functions is increasingly in 
persistence and not in height of the activation curve. 

We have carried out several simulations of the word advantage using the 
same stimulus list used for simulating the blank mask results. Since the 
internal workings of the model are completely deterministic as long as proba- 
bility of feature extraction is 1.0, it was only necessary to run each item 
through the model once to obtain the expected probability that the critical 
letter would be encoded correctly for each item, under each variation of 
parameters tried. 

One somewhat problematical issue involves deciding when to read out the 
results of processing and select candidate letters for each letter position. 
For simplicity, we have assumed that this occurs in parallel for all four 
letter positions and that the subject learns throuf^h practice to choose a time 
to read out in order to optimize performance. We have assumed that readout 
time may be set at a different point in different conditions, as long as they 
are blocked so that the subject knows In advance what type of material will be 
presented on each trial in the experiment. Thus, in simulating the Johnston 
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and McClelland (1973) results » we assumed different readout times for letters 



in words and letters in unrelated context » with the different times selected 



on the basis of practice to optimize performance on each type of material. 



However, this is not a critical characteristic of the account. The word 
advantage is only reduced slightly if the same readout time is chosen for both 
Single letters and letters in words, based on optimal performance averaged 
over the two material types. 

Employing the parameter values given in Table 1 with the high value of 
the letter to word inhibition parameter and the moderate intensity input 
parameters employed in the figure, we get 81 percent correct on the letters 
embedded in words and 66 percent correct for letters in a // context or iso- 
lated single letters with a 15-cycle target presentation followec immediately 
by th- mask. The results were hardly effected at all by using the lower value 
of letter to word inhibition, for reasons which will be clearer when we con- 
sider the effect of this parameter on activation at the word level in the sec- 
tion on the perception of pronounceable nonwords below. For either parameter 
value, the model provides a close account of the Johnston-McClelland data. 

We have explored our model over a substantial range of input parameter 
values and have obtained large word advantages over single letters over much 
of the range. In the case of very high intensity inputs, however, we were 
forced to add an additional assumption to produce a reasonably large word 
advantage. As we already noted, when the input is very strong the effect of 
feedback is t*^ increase the persistence, rather than the height of the letter * 
activation curves. But as we increase the intensity of the display we also 
Increase the potency of the mask* Eventually, the mask becomes so strong that 
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it can drive activations for both single letters and letters embedded in words 
dowTj so quickly v.hat there is little difference between them* In order to get 
the advantage ^r. this case, it was necessary to adopt the assumption that 
there is a maximum inhibitory effect that can be exerted from the feature to 
the letter level. A value of .55 works out well over a large range of 
stimulus intensities Note that for low or moderate values of input strength 
this parameter does not come in to play, but it is quite important in the case 
of a very high quality display* 

Such high quality input conditions represent a kind of upper extreme of 
the range we have explored* We have als*) explored what happens with low qual- 
ity inputs in which the stimulus quality is so poor that some of the features 
may go undetected* These conditions produce a reasonable word advantage also, 
but only as long as a lower value of letter to word inhibition is adopted. As 
we saw before ♦ with degraded input it is necessary to use a lower value of 
letter to word inhibition in order to allow words to become activated even 
when there are multiple letter possibilities active in some or all of the 
letter positions* 

Effects of Masking with Letters and Words 

Several studies in the recent literature examine the effects on word per- 
ception of following the target with a mask which is composed of letters or 
words, as opposed to a patterned stimulus containing nonsense squiggles or 
nonletter printing characters (Jacobson, 1973i 197U; Taylor 4 Chabot. 1978). 
In all three of these studies » it appears that performance on words is worje 
when the mask contains unrelated letters or words than it is when the mask 
contains nonletters, and there is little or no difference between words and 
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unrelated letter strings as masks t as long as the word is unrelated to the 
target. One of us has recently collaborated in a study using the Reicher pro- 
cedure which shows analogous results (Johnston & McClelland^ in press). In 
addition, we find that the presence of letters in the mask hurts performance 
on single letter displays very little compared to the extent to which it hurts 
performance on letters in words. Thus, the word advantage over single letters 
is reduced when a mask containing letters is used, compared to non-letter pat- 
terned masks. 

In these experiments, Johnston and McClelland (in press) compared perfor- 
mance on single letters and letters in words under three types of masking con- 
ditions: Masking with words, masking with random letter sequences, and masking 
with non-letter characters formed by recombining fragments of letters to make 
non-letters. One experiment compared perception of letters and words when the 
stimuli were masked with non-letter mask chai voters and when they were masked 
with words. Each condition was tested in a separate block of trials, to allow 
subjects to try to optimize their performance in each condition. As in most 
word perception experiments, target duration i;as varied between subjects to 
find a duration for each subject at which about 75% correct average perfor- 
mance over all material types was achieved. The results, shown in Table 3, 
indicate that there was a large word advantage with the non-letter masks. 
This replicates the typical finding in such studies. The interesting finding 
is that the word advantage is considerably reduced with word masks. This is 
true even though the non-letter character masks contain the same set of line 
segments occurring in the letters used in the word masks. 
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Table 3 

Actual & Simulated Results 
(Probability Correct Forced Choice) 
Johnston & McClelland (in press) 



Target Type 
Word Letter 



Difference 



\ 



Experiment I 
Nonletter Mask 
Word Mask 

Experiment II 
Word Mask 
Letter Mask 

Experiment III 
Nonletter Mask 
Letter Mask 

Simulation 

Nonletter Mask 
Letter Mask 
Word Mask 



.86 

.78 
.78 

.86 
.79 

.90 
,76 
.76 



.71 
.68 

.75 
.75 

.65 
.71 

.70 
.69 
.69 



.15 
.06 

.03 
.03 

.21 
.08 

.20 
.06 
.06 



Note: In Experiment III, target duration was 10 msec longer with letter masks 
than with nonletter masks, in order to produce the observed oross-over in- 
teraction. 
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A second experiment compared performance on words r and «ingle letters 

using two kinds of masks containing letters. In one, the letters spelled 

worf^s as in Experiment I; in the other they formed unrelated letter strings. 

Both types of material produced a very slight word advantage, -and there was no 
difference between them. 

The third experiment compared performance on . words ano single letters 
with the same non-letter masks used in the first experiment, and .with masks 
containing four unrelated letters, larget duration was set slightly longer in 
the letter mask condition to achieve approximately the same overall percent 
correct performance level in each of the two mask cortditions. That is, targ'et 
duration was always set to be 10 msec longer with letter mask than with the 
feature mask. The manipulation was successful in eliminating the overall 
difference between feature and letter mask conditions, but did not eliminate 
the interaction of target and mask type. The size of the word advantage over 
nonwords was more than twice as great in the feature mask condition as in the 
letter mask condition. v. 

Our model provides a simple account of the main findings as illustrated 
in Figure 9. In the case of word targets, the letters in the mask become 
active before the output reaches its maximum strength. These new activations 
compete with the old ones produced by th'> target to reduce the probability of 
correctly encoding the target letter. A secondary effect of the new letters 
is to inhibit the activation of the word (or words) previously activated by 
the mask. This indirectly results is ah increase in tfie rate of decay of the 
target letters, because their top-down support is weakened. A tertiary effect 
. of the mask, if it actually contains a word, is to begin activating a new word 
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Figure 9. Activation functions (top) and output probability curves (bot- 
tom) for the letter 0. both alone (left) and in the word MOLD (right), with 
feature, letter, and word masks. 
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at the word level* These later two effects do not actually come into play 
until after the peak of the output function has. already passed, so they have 
no effect on performance. 

According to this interpretation, the major role of letters in the mask 
is to compete at the letter level with the letters previously activated by the 
target. Competition of this sort also happens with single letter targets as 
wellt but it has less of an effect in this case for the following reason. The 
activations for single letter targets are not reinforced by the word level, 
and so the bottom-up inhibition generated by the mask more quickly drives the 
old activations down. By the time the mask has a chance to activate new 



letters, the peak in the output function has already been reached. The new 
letters definitely have an effect on the tail of the output function, but we 
assume that subjects read out at or near the peak so these differences are 
irrelevant. 

In preliminary attempts to simulate these results, we found that the 
model was quite sensitive to the similarity of the letters in the target and 
the feature-arrays (be they letters or non-letters) in the mask. We therefore 
tailored the non-letter mask characters to have the same number of features 
different from the target letter they were masking as the mask letters had. 
For this reason, it was not feasable to test a large number of different 
items. Instead, we tested all four letters in the word M(X-D. The letter mask 
display was ARAT, and the four feature masks were constructed so that the 
f'.rst had the same number of features in common with M as the letter A did, 
the second had the same number of features in common with 0 as R did, etc. 
For the word mask, we simply altered the lexicon of ^he program so that ARAT 
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"became" a word (if only such manipulations could be used on human subjects!). 



level of target- type (word, single letter) and mask type (feature, letter 
word), and all three masks types are exactly equated in their bottom-up 
potency. 

The results of the simulation are summarised in the Table 3. In produc- 
ing an interaction of this magnitude, we had to assume very high levels of 
feature to letter excitation and inhibition (.04 and 1.2, respectively). 
Under these conditions, the the bulk of the effect of feedback is to increase 
the persistence (rather than the height) of the activation function. The 
strong input values for the mask also' permit the new letters in the masK to 
produce new activations very rapidly at the letter level, thus contributing to 
the size, of the interaction. 

The simulation results shown in the Table were produced using the strong 
value (.21) of letter to word inhibitior.. It seems appropriate to use the 
strong value since the subjects expected only words, as discussed in the next 
section (with this value, the fact that ARAT is pronounceable is irrelevant to 
the functioning of the model, as we shall see). In fact though, the simula- 
tion produces the interaction both with strong and weak letter to word inhibi- 
tion, although it is somewhat weaker with weak letter to word inhibition. The 
reason for the difference has to do with the strength of the secondary effect 
of the mask letters in inhibiting the .word (s) activated by the target, thereby 
removing the support of the activations of the letters in the target word. 
With stronger letter to word inhibition, this effect is stronger than when the 
letter to word inhibition is weak. 



Thus, we have tests of four different letters (M,0,L, and D) at each joint 
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The Johnston 4 McClelland (in press) experiment was designed as a test of 
a hierarchical model of word perception, in which there was no feedback from 
the word level to the letter level. Instead, readout could occur from either 
the letter level or the word level. The greater effectiveness of letter masks 
wts assumed to be due to activation of new letters which would provide disrup- 
tive input to the word level. In our model, the greater effectiveness of 
letter masks is also assumed to be due to activation of new letters, but for a 
slightlly different reason. Instead of interfering directly with the 
representation at the word-level, the new letters produce the bulk of their 
effect by interfering with the readout of old activations at the letter level 
which are being maintained by feedback. We have not been able to think of a 
way of distinguishing these views, since they differ mainly in the level of 
the system from which readout occurs, something which may be very difficult to 
assess directly. In any case, it is clear that our model provides an account 
of the effect of mask letters, in addition to its account of the basic effects 
of patterned and unpatterned masks. 

Perception of Regular Nonwords 

One of the most important findings in the literature on word perception 
is that an item need not be a word in order to produce facilitation with 
respect to unrelated letter or single letter stimuli. The advantage for pseu- 
dowords over unrelated letters has been obtained in a very large number of 
studies (Aderman & Smith, 1971; Baron & Thurston, 1973; Carr, et al, 1978; 
McClelland, 1976; Spoehr & Smith, 1975). The pseudoword advantage over single 
letters has been obtained in three studies (Carr, et al , 1978; Massaro & 
Klitzke, 1979; McClelland & Johnston, 1977 ). 
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As we have already noted, these effects appear to depend on subjects' 
expectations. When subjects know that the stimuli include pseudowords, both 
words and pseudowords have an advantage over unrelated letters (and single 
letters) and the difference between words and pseudowords is quite small. In 
some studies, no reliable difference is obtained (Spoehr & Smith, 1975; Baron 
i Thurston, 1973; McClelland & Johnston, 1977) whereas in others, a difference 
has been reported of up to about 6% (Carr, et al, 1978; Manelis, 1974; McClel- 
land, 1976). 

Interestingly, when subjects do not expect pseudowords to be shown, 
letters in these stimuli have nc advantage over unrelated letters. Aderman 
and Smith (1971) found that this was true when the subjects expected only 
unrelated letters. Carr, et al (1978) replicated this effect, and added two 
very interesting facts (Table 'O. First, the word advantage over unrelated 
letters can be obtaine: when subjects expect only unrelated letters, even 
though letters in pseudowords show no - ■■•able advantage at all under these 
conditions. Second, when subjects expenl only words they perform quite poorly 
on letters in pseudowords compared to unrelated letters. 

At first glance, these data seem to suggest that there must be different 
processing mechanisms responsible for the word and pseudoword effects. There 
seems to be a word mechanism which is engaged automatically if the stimulus is 
a word, and a pseudoword mechanism which is brought into play only if pseudo- 
words are expected. However, wc will show that these results are completely 
corisistt 't with the view that there is a single mechanism for processing both 
words and pseudowords, with a parameter which ii. /nder the subject's oontrol 
determining whether the mechanism will produce a "acilitation only i^or words 
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Table 4 

Effect of Expected Stimulus Type 
on the Word and Pseudoword Advantage over Unrelated Letters 
(Difference in Probability Correct Forced Choice) 

Carr, et al, 1978 

Expectation 



Target 
Word 

Pseudoword 



Word Pseudoword 



15 



03 



.15 



.11 



Unrelated 
Letters 

.16 
-.02 
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or for both wor^a and pseudowordSi FirJt, we will examine how the model 
accounts for the pseudoword advantage at all. 

The Basic Pseudoword Advantage 

The model produces the facilitation for pseudowords by allowing them to 
activate nodes for words which share more than one letter In common with the 
display. When they occur, these activations produce feedback, just as in the 
case of words, strengthening the letters which gave rise to them. These 
activations occur in the model if the strength of letter to word inhibition is 
reasonably small compared to the strength of letter to word excitation. 

To see how this takes place in detail, consider a brief presentation of 
the pseudoword HAVE, followed by a patterned mask (the pseudoword is one used 
by Glushko, 1979, in developing the idea that partial activations of words are 
combined to derive pronunciations of pseudowords). For this example, the 
input parameters corresponding to the moderate quality display were used, in 
conjunction with low letter to word inhibition. As illustrated in Figure 10, 
presentation of HAVE results in the initial activation of 16 different words. 
Most of these words, like *have' and 'gave', share three letters in common 
with MAVE. By and large, these words steadily gain in strength while the tar- 
get is on, and produce feedback to the letter level, sustaining the letters 
which supported them. 

Some of the words are weakly activated for a brief period of time before 
they fall back below zero. These, typically, are words like 'more' and 'many' 
which share only two letters with the target but are very high in frequency* 
so they need little excitation before they exceed threshold. But, soon after 
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Figure 10. Activation at the word level upon presentation of the nonword 

MAVE. 
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they exceed threshold, the total activation at the word level gets strong 
enough to overcome the weak excitatory input, causing them to drop down just 
after they Jegln to rise. Less ^'requent words sharing two letters with, the 
word displayed have a less exciting fate still. Since they start out ini- 
tially at a lower value, they generally fail to receive enough excitation to 
make it up to threshold. Thus, words which share only two letters in common 
with the target tend to exert a rather minimal influence on the amount of 
feedback being generated. In general then, the amount of feedback, and hence 
the amount of facilitation, depends primarily on the activation of nodes for 
words which share three letters with a displayed pseudoword. It is the nodes 
for these words which primarily interact with the activations generated by the 
presentation of the actual target display, so in what follows we will use the 
word neighborhood to refer to the set of words which have three letters in 
common with the target letter string. 

The amount of feedback a particular letter in a nonword receives depends, 
in the model, on two primary factors and two secondary factors. The two pri- 
mary factors are the number of words in the entire nonword 's neighborhood 
which include the letter, and the number of words which do not. In the case 
of the M in MAVE, for example, there are 7 words in the neighborhood of MAVE 
which begin with M, so the 'm' node gets excitatory feedback from all of 
these. These words are called the "friends" of the 'm' node in this case. 
Because of competition at the word level, the amount of activation which these 
words receive depends on the total number of words which share three letters 
in common with the target. Those which share three letters with the target 
but are inconsistent with 'm' (e.g., 'have') produce inhibition which tends to 
limit the activation of the frieids of 'm', and can thus be considered th^ 
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enemies of *m*. These words also produce feedback which tends to activate 
letters which were not actually presented. For example, activation from 
'have' produces excitatory input to 'h', thereby producing some competition 
with the 'm'. These activations, however, are usually not terribly strong. 
No one word gets very S|.rongly active, and so letters not in the actual 
display tend to get fairly weak excitatory feedback. This weak excitation is 
usually insufficient to overcome the bottom-up inhibition acting on non- 
presented letters. Thus, in most cases, the harm done by top-down activation 
of letters which were not shown is minimal. 

A part of the effect we have been describing is illustrated in Figure 11. 
Here, we compare the activations of the nodes for the letters in HAVE. 
Without feedback, the four curves would be identical to the one "single 
letter" curve included for comparison. So, although there is facilitation for 
all four letters, there are definitely differences in the amount, depending on 
the number of friends and enemies of each letter. Note that within a given 
pSeudoworu, the total number of friends and enemies (i.e., the total number of 
words with three letters in common) is the same for all the '.etters. 

There are two other factors which affect the extent to which a particular 
word will bacome active at the word level when a particular pseudoword is 
shown. Although the effects of these factors are only rather weakly reflected 
in the activations at the letter level, they are nevertheless interesting to 
note, since they indicate some synergistic effects which emerge from the 
interplay of simple excitatory and inhibitory influences in the neighborhood. 
These are the rich-get-richer effect and the gang effect. The rich-get-richer 
effect is illustrated in Figure 12, which compares the activation curves for 
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Figure 11. Activation functions for the letters 'a' and 'v' in on presen- 
tation of MAVE. Activation function for 'e' is indistinguishable from func- 
tion for 'a', and that for 'ni' is similar to that for 'v'. The activation 
function for a letter alone or in unrelated context is included for compari- 
son . 
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the nodes for 'have', 'gave', and 'save' under presentation of MAVE. The 

words differ in frequency, which gives the words slight differences in base- 
line activation. What is interesting is that the difference gets magnified, 
so that at the point of peak activation there is a much larger difference. 
The reason for the amplification can be seen by considering a system contain- 
ing only two nodes 'a' and 'b', starting at different initial positive activa- 
tion levels, 'a' and 'b' at time t. Let's suppose that 'a' is stronger than 
•b' at t. Then at t+1, 'a' will exert more of an inhibitory influence on 'b*, 
since inhibition of a given node is determined by the sum of the activations 
of all units other than itself. This advantage for the initially more active 
^^o^es is compounded further in the case of the effect of word frequency by the 
fact that more frequent words creep above threshold first, thereby exerting an 
inhibitory effect on the lower frequency words when they are still too weak to 
fight back at all. 

Even more interesting is the gang effect, which depends on the coordi- 
nated action of a related set of word nodes. This effect is depicted in Fig- 
ure 13. Here, the activation curves for the *move, 'make*, and *save* nodes 
are compared. In the language, 'jK>ve* and *make* are of approximately equal 
frequency, so their activations start out at about the same level. But they 
soon pull apart. Similarly, *save* starts out below *move*, but soon reaches 
a higher activation. The reason for these effects is that *make* and *save' 
are both members of gangs with several members, while 'move' is not. Consider 
first the difference between 'make' and 'move'. The reason for the difference 
is that there are several words which share the same three letters in common 
with MAVE as 'make' does. In the list of words used in our simulations, there 
are 6. These words all work together to reinforce the 'm', the 'a', and the 
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Figure 12. The rich-get-richer effect. Activation functions for the 
nodes for 'have', 'gave' and 'save', under presentation of MAVE. 
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Figure 13. The gang effect. Activation functions for 'move', 'male' and 
'save' under presentation of HAVE. 
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'e', thereby producing much stronger reinforcement for themselves. Thus, 
these words make up a^gang called the 'ma_e' gang. In this example, there is 
also ^-'^av^' gang consisting of a different 6 words, of which 'save' is one. 
All of these work together to reinforce the 'a', 'v', and 'e'. Thus, the 'a' 
and 'e' are reinforced by two gangs, while the letters 'v' and 'm' ?»re rein- 
forced by only one e^ch. Now consider the word 'move'. This word is a loner; 
there are no other words in its gang, the 'm_ve' gang. Although two of the 
letters in 'move' receive support from one gang each, and one receives support 
from both other gang's, the letters of 'move' are less strongly enhanced by 
feedback than the letters of the members of thfe other two gangs. Since con- 
tinued activation of one word in the face of the competition generated by all 
of the other partially activated words depends on the activations of the com- 
ponent letter node?, the words in the iither two g .gs eventually gain the 
upper hand and drive 'move' back below the activation threshold. 

As our study of the HAVE example illustrates, the pattern of activation 
which is produced by a particular pseudoword is complex and idiosyncratic. In 
addition to the basic fri nds and enemies effects, there are also the - rich- 
get-richer and the gang effects. These effects are primarily reflected in the 
pattern of activation at the word level, but they also exert subtle influences 
on the activations at the letter level. In general* though, the main result 
is that when the letter to word inhibition is low» all four letters in the 
pseudoword receive some feedback reinforcement. The result, of course, is 
Kre-iter accuracy reporting letters in pseudowords nomprirod to singl»^ Iptt^^rr,. 

Th£ Rol^ of Expectations « 

It should now be clear that variation in letter to word inhibition pro- 
duces different degrees of enhancement. When this parameter is :;:nall, th'^ 
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pseudoword advantage is large, and when the parameter is large, the advantage 
gets small. Indeed, if the letter to word inhibition is equal to three times 
the letter to word excitation, then no four-letter nonword can activate the 
node for any four-letter word. The reason is that it can have no more than 
three letters in common with a word. The inhibition generated by the letter 
which is different will cancel the excitation generated by the letters that 
ate the same. 

We can now account for Carr, et al's (1 978) findings with pseudoword s by 
simply assuming that when subjects expect only words they will adopt a large 
value of the letter to word inhibition parameter, but when they expect pseudo- 
words they adopt a small value. Apparently, when they expect unrelated letter 
strings, at least of the type used in this experiment, they also adopt a large 
value of letter to word inhibition. Perhaps this is the normal setting, with 
a relaxation of letter to word inhibition only used if pseudowords are known 
to occur in the list or when the stimulus input Is very degraded. 

But we have still to consider what effects variation of letter to word 
inhibition might have for word stimuli. If relaxation of letter to word inhi- 
bition increases accuracy for letters in pseudowords, ^t- might expect it to do 
the same thing for letters in words. However, in general this is not the 
case. Part of the reason is that the word shown still gets considerably more 
activation than any other word, and tends to keep the activations of other 
nodes from getting very strong. This situation is illustrated for the word 
CAVE in ^igure 14. A second factor is that partial activations of other words 
are not an unmixed blessing. The words which receive par '.al activations all 
produce inhibition which keeps the activation of the node for the word shown 
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Figure 14. Activity at the word level upon presentation of CAVE» with 
weak letter to word inhibition. ^ 
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from getting activated as strongly as it would be otherwise. The third factor 
is that the activations of any one word sharing three letters with the word 
shown only reinforce three of the four letters in the display. For these rea- 
sons, it turns out that the value 6f letter to word inhibition can vary from 
.04 to .21 with very little effect on word performance. 

Comparison of Performance on Words and Pseudowords 

Let us now consider the fact that the word advantage over pseudowords is 
generally rather small in experiments where the subject knows that the stimuli 
include pseudowords. Some fairly representative results, from the study of 
McClelland and Johnston (1977) are illustrated in Table 5. The visual condi- 
tions of the study were the same as those used in the patterned mask condition 
in Johnston and McClelland (1973). Trials were blocked, so subjects could 
adopt the optimum strategy for each type of material. The slight word- 
pseudoword difference, though representative, is not actually statistically 
reliable in this study. 

Words differ from pseudowords in that they strongly activate one node at 
the word level. While we would tend to think of th_j as increasing the amount 
of feedback for words as opposed to pseudowords, there is the word-level inhi- 
bition which must be taken into account. This inhibition tends to equalize 
the totjl amount of activation at the word level bet^ween words and pseudo- 
words. With words, the word shown tends to dominate the pattern of activity, 
thereby keeping all the words with three letters in common with it from 
achieving the activation level they would reach in the absence a node 
activated by all four letters. The result is that the sum of the activations 
of all the active units at the word level is not much different between the 
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Table 5 

Actual and Simulated Results of the 
McClelland & Johnston (1977) Experiments 
(Prooability Correct Forced Choice) 

Target Type 



Word 



Pseudoword Single Letter 



Data 



High BF 
Low BF 



.81 
.78 



.79 
.77 



.67 
.64 



Average 



.80 



.78 



.66 



Simulation 



High BF 
Low BF 



.81 
.79 



.79 
.77 



.67 
.67 



Average 



.80 



.78 



.67 
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two cases. Thus, CAVE produces only slightly more facilitation for its con- 
stituent letters than HAVE as illustrated in Figure 15* 

In addition to the mere leveling effect of competition at the word level, 
it turns out that one of the features of the design of most studies comparing 
performance on words and pseudowords would operate in our model to keep per- 
formance relatively good on pseudowords. In general, most studies comparing 
performance on words and pseudowords tend to begin with a list of pairs of 
words differing by one letter (e.g., PEEL-PEEP), from which a pair of nonwords 
is generated differing from the original word pair by just one of the context 
letters, thereby keeping the actual target letters and as much of the context 
as possible the same between word and pseudoword items (e.g., TEEL-FEEL). A 
previously unnoticed side-effect of this matching procedure is that it ensures 
that the critical letter in each pseudoword has at least one friend, namely 
the wor^ from the matching pair which differs from it by one context letter ♦ 
In fact, most of the critical letters in the pseudowords used by McClelland 
and Johnston tendea to have relatively few enemies, compared to the number of 
friends. In general, a particular letter should be expected to have three 
times as many friends as enemies. In the McClelland and Johnston stimuli, the 
great majority of the stimuli had much larger differentials. Indeed, more 
than half of *:he critical letters had no enemies at all. 

The Puzzling Absence of Cluster Frequency Effects 

In the account we have just described, facilitation of performance on 
letters in pseudowords was explained by the fact onat pseudowords tend to 
activate a large number of words, and these words tend to work together to 
reinforce the activations of letters. This account might seem to suggest that 
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pseudowords which have common letter-clusters, and therefore have several 
letters in common with many words, would tend to produce the greatest facili- 
tation. However, th*s factor has been manipulated in a number of studies and 
little has been found in the way of an effect. The McClelland and Johnston 
study is one case in point. As the table illustrates, there is only a slight 
tendency for superior performance on high cluster frequency words. This 
slight tendency is also observed in single letter control stimuli, suggesting 
that the difference may be due to differences in perceptibility of the target 
letters in the different positions, rather than cluster frequency per se. In 
any case, the effect is very small. Others studies have likewise failed to 
find any effect of cluster frequency (Spoehr & Smith, 1975; Manelis, 1974). 
The lack of an effect is most striking in the McClelland and Johnston study, 
since the high and low cluster frequency items differed widely in cluster fre- 
quency as measured ir a number of different ways. 

In our model, the lack of a cluster frequency effect is due to the effect 
of mutual inhibition at the word level. As we have seen, this mutual inhibi- 
tion tends to keep the total activity at the word level roughly constant over 
a variety of different input patterns, thereby greatly reducing the advantage 
for high cluster frequency items. Items containing infrequent clusters will 
tend to activate few words, but there will be less competition at the word 
level, so that the words which do become active will reach higher activation 
levels. 

Thp situatiou is illustrated for the nonwords TEEL and HOEM in Figure 16, 
While TEEL activates many more words, the total activation is not much dif- 
ferent in the two cases . 
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Figure 16. The number of words activated (top) and the total activation 
at the word level (bottom) upon presentation of the nonwords TEEL and HOEM. 
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The total activation is not, of course, the whole story. 



The ratio of 



friends to enemies is also important. And, it turns out that this ratio is 
working against the high cluster items more than the low cluster items. It 
turns out that in McClelland and Johnston's stimuli only one of the low clus- 
ter frequency nonword pairs had critical letters with any enemies at all! For 
23 out of 24 pairs, there was at least one friend (by virtue of the method of 
stimulus construction), and no enemies. In contrast, for the high cluster 
frequency pairs, there was a wide range, with some items having several more 
enemies than friends. 

To simulate the McClelland and Johnston results, we had to select a sub- 
set of their stimuli, since many of the words they used were hot in our word 
list* Since the stimuli had been constructed in sets containing a word pair, 
a pseudoword pair, and a single letter pair differing by the same letters in 

the same position ( e.g*, PEEL-PEEP TEEL--TEEP; ^L- ^P), we simply selected 

all those sets in which both words in the pair appeared in our list. This 
resulted in a sample of 10 high cluster frequency sets and 10 low cluster fre- 
quency sets. The single letter stimuli derived from the high and low cluster 
frequency pairs w^re also run through the simulation. Both members of each 
pair were tested. 

Since the stimuli were presented in the actual experiment blocked by 
material type, we selected an optimal time for readout separately for words. 



pseudowords, and single letters. Readout time was the same for high and low 
cluster frequency iteuis of the same type, since these were presented in a 
mixed list in the actual experiment. The run shown in the table used the fol- 
lowing parameters: letter to word inhibition v^^s set to the low value (.04); 
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the input parameters associated with the moderate quality display were used 
(feature to letter excitation = .005. inhibition = .15). The display was 
presented fjr a duration of 15 cycles. 

The simulation shows the same general pattern as the actual data. As in 
the actual data, the magnitude of the pseudoword advantage over single letters 
is Just sligbly smaller than the word advantage, and the effect of cluster 
frequency is very slight. Qualitatively similar results are obtained when the 
input parameters associated with the very high quality display are used. For 
the word condition, it makes very little difference if the value of letter to 
word inhibition is high or low, except that the slight advantage for high 
cluster frequency words is eliminated. 

We have yet to consider hovr the model deals with unrelated letter 
strings. This depends a little on the exact characteristics of the strings, 
and the value of letter to word inhibition. With high letter to word inhibi- 
tion, unrelated letters fare no better than pseudowords: they fail to excite 
any words, and there is no feedback. When the value of letter to word inhibi- 
tion gets low. there is some activity at the word level with many so-called 
unrelated letter strings. Generally speaking, however, these strings rarely 
have more than two letters in common with any one word. Thus, they only tend 
to activate a few words very weakly, and because of weakness of the 

bottom-up excitation, competition among partially activated words keeps any 
one from getting very active. So, little benefit results. When we ran our 
simulation on randomly-generated consonant strings, there was only a 1% advan- 
tage over single letters. 
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Some items which have been used as unpronounceable nonwords or unrelated 
letter strings do produce a weak facilitation. We ran the nonwords used by 
McClelland and Johnston (1977) in their Experiment 2. These items contain a 
large number of vowels in positions which vowels tend to occupy in words, and 
they therefore tend to activate more words than, say, random strings of con- 
sonants ♦ The simulation was run under the same conditions as the one reported 
above for McClelland and Johnston's first experiment. The experiment produced 
a slight advantage for letters in these nonwords, compared to single letters, 
as did the experiment. In both the simulation and the actual experiment, 
forced-choice performance was 4X more accurate for letters in these unrelated 

letter strings than in single letter stimuli. 
•« 

On the basis of this c5haracterist ic of our model, the results of one 
experiment on the, importance of vowels in reading may be reinterpreted. 
Spoehr and Smith (1975) found that subjects were more accurate reporting 
letters in unpronounceable nonwords containing vowels than in all consonant 
strings. They interpreted the results as supporting the view that subjects 
parse letter strings into "Vocalic Center Groups." However, an alternative 
possible account is that the strings containing vowels had more letters in 
common with actual words than the all consonant strings. 

In summary, the model provides a good account of the perceptual advantage 
Tor letters in pronounceable nonwords but not unrelated letter strings. In 
addition, it accounts for the dependence of the pseudoword advantage on expec- 
tation. and for the lack of ar effeot of expectation on the advantage for 
letters in words. Third, the model accounts for the small difference between 
perfqrmance on words and pseudowords when the subject is aware that the 
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stimuli Include pseudowords, and for the absence of any really noticeable 
cluster frequency effect. 

Our examination of the model suggests that there are different ways 
interactive activation can influence perception. When letter to word inhibi- 
tion is set to w high value, the system acts as a sharply tuned filter. In 
this mode, the system will reinforce activations only of those patterns which 
it has explicitly stored in particular nodes.. When the same r eter is set 
to a small value, the system allows for nodes for stored p • ns which are 
similar to the new input to become partially activated, thereby permitting it 
to reinforce activations of patterns 'which are not in fact stored. In this 
mode the model shows the capacity to apply knowledge explicitly encoded as 
spellings of particular words in such a way that it facilitates the processing 
of stimuli that are similar to several stored patterns , .but not identical to 
any. 

The Role of Lexical Constraints 
The Johnston Experiment 

Several models which have been proposed to account for the word advantage 
rely on the idea that the context letters in a word facilitate performance by 
constraining the set of possible letters which might have been presented in 
the critical lotter position. Models of this class predict that contexts 
which strongly constrain what the target letter might be result in greater 
accuracy of perception than more weakly constraining contexts. For example, 
the context _HIP should facilitate the perception of an initial S more than 
the context INK. The reason is that _HIP is more strongly constraining. 
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since-only three letters (S, "C, and W) fit in the context to make a word, com- 
pared to _INK, where nine letters (D, F, K, L, M, P, R, S, and W) fit in the 
context to make a word.. In a test of such models, Johnston (1978) compared 
accuracy of perception of letters occurring in high and low constraint con- 
texts. The same target letters were tested iw the same positions in both 
cases. For example, the letters S and W were tested in the high constrain't 

_HIP context and the low constraint _TNK context. Using bright 

/ 

target/patterned mask conditions, Johnston found no difference in accuracy of 
perception betweer letters in the high and low constraint contexts. The 
results of this experiment are shown in Table 6, Johnston measured letter 
perception in two ways. He not only asked the subjects to decide which of two 
letters had been presented (the forced-choice measure), but he also asked sub- 
jects to report the whole word and recorded how often they got the critical 
letter correct. No significant difference was observed in either case. In 
the forced choice there was a slight difference favoring low constraint items, 
but in the free report there was no difference at all. 

Although our model does use contextual constraints (as they are embodied 
in specific lexical items), it turns out that it does not predict that highly 
constraining .contexts will facilitate perception of letters more than weakly 
constraining contests under bright target/pattern mask conditions. Under such 
conditions, the role of the word level is not to help the subject select among 
alternatives *^'left open by an incomplete feature analysis process, but rather 
to help maintain the activation of the nodes for the letters presented. 

In Johnston's experiments, only words were shown, so on the basis of our 
interpretation of the Carr et al (1978) findings mentioned above, we would 
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Tgble 6 

Actual & Simulated Results from Johnston (1978) 
(Probability Correct) 
Constraint 



Actual Results 



High 



Low 



Forced Choice 



Free Report 



.768 
.5«5 



.795 



Simulation 



Forced Choice 



Free Report 



.773 
.563 



.763 



Note: Simulation was run using low letter to word inhibition and moderate 
quality display parameters. Similar results are obtained using high quality 
display parameters. There is no effect of constraints when high letter to 
word inhibition is used. 
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expect that subjects would tend to adopt a large value of letter to word inhi- 
bition. If the .21 value were used, our model produces no difference whatso- 
ever between high and low constraint items. The reason is simply that only 
the node for the word actually shown ever gets activated at all. The nodes 
for all other words receive either net inhibition or a net neutral input if 
they share three letters in common with the word shown. 

If we assume that a small value of letter to word inhibition is used (.04 
..onstead of ^21), our model -produces-a-v^-y- -small, advantage fo r h ig h constraint 
items. In Uhis case, the presentation of a target word results in the weak 
antivat? ( v. of the words which share three letters in common with the target. 
Some of these words are "friends" of the critical letter in that they contain 
the actual critical letter shown, as well as two of the lelt.ers from the con- 
text (e.g., 'shop' is a friend of the initial S in SHIP). Some of the words, 
however, are "enemies" of the critical letter, in that they contain the three 
context letters of the word, but a different letter in the critical letter 
position (e.g. 'chip' and From our point of view, Johnston's constraint mani- 
pulation is essentially a manipulation of the number of enemies the critical 
letter has in the given cof.text. I*- turns out that Johnston's high and low 
constraint stimuli have equal numbers of friends, on the average, but (by 
design), the hi^h constraint items ^ave fewer enemies as shown in Table 7. 

Using a low value for the letter to word inhibition results in the 
friends and enemies of the target word receiving some activation. Under these 
conditions (with either high or moderate quality input parameters) our model 
does produce a slight advantage for the high constraint items. The reason for 
the slight effect is that lateral interference at the word level lets the 
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Table 7 







Friends and Enemies of 


the 










{> 

Critical Letters in 


the 










Stimuli Used by Johnston 


(1978) 










High Constraint 


Low 


Constraint 






friends enemies ratio 


friends 


enemies 


ratic 


pos 


1 


3. 33 2.22 .60 


3.61 


6.44 


.36 


pos 


2 


9.17 I.Cj .90 


6.63 


2.88 


.70 


pos 


3 


6.30 1.70 .79 


7.75 


4.30 


.64 


pos 


4 


4. 96 1.67 .75 


6.67 


3.50 


.66 


ave 




5.93 1.65 


6. 17 


4.27 
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enemies of the critical letter keep the node for the word presented and the 
nodes for the friends from getting quite as strongly activated as they would 
otherwise/ The effect is quite small for two reasons. First, the node for 
the word presented receives four excitatory inpjts from the letter level, and 
all other words can only receive at most three excitatory inputs, and at least 
one inhibitory input. As we saw in the case of the word CAVE, the node for 
the correct word dominates the activations a^ the word level, and is predom- 
inantly responsible for any feedback to the letter level. Second, wfile the 
high constraint items have fewer enemies, by more than a two to one margin, 
both high and low constraint Items have, on the average, more friends than 
enemies. The friends of the target letter work with the actual word shown to 
keep the activations of the enemies in check, thereby reducing the extent of 
their_inhibitory effect still further. The ratio of the number of friends 
over the total number of neighbors ia not all that d'.ferent in the two condi- 
tions, except in the first serial position. 

This discussion may give th^i impression that contextual constraiuc is not 
an important variable 'n our model. In fact, it is quite powerful. But its 
effects are obscured in the Johnston experiment because of &he strong domi- 
nance of the target word when all the features are extracted, and the fact 
that we are concerned with the likelihood of perceiving a particular letter 
rather than performance in identifying correctly what whole word was shown. 
We will now consider an experiment in which contextual constraints play a 
strong role, because the characteristics just mentioned are absent. 
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The Broadbent and Gregory Experiment 

Up to now we have found no evidence that either blgram ft-equency or lexi- 
cal constraints have any effect on per f on.a>ice . However. In experiments uslnR 
the traditional whole report method these variables have been shown to have 
substantial effects. Various studies have shown that recognition thresholds 
are lower, or rebofnltlon accuracy higher at a fixed recognition threshold 
value. When relatively unusual words are used (Bouwhuls. 1979: Havens 4 Foote, 
1963: Newbigglng.- 1961). Such Items tend to below In blgram frequency, and . 
at the same time high in lexical constraint. 

In one experiment. Broadbent and Gregory (1968) Investigate, the. role of 
blgram frequency at two different levels of word frequency and found an 
interesting Interaction. We now consider how our model can account for their 
results. To begin, it is Important to note that the visual conditions of 
their experiment were quite different from those of McClelland and Johnston 
(,977) m Which the data and our model failed to show a blgram frequency 
effect, and of Johnston (197ii) in which the data and the model showed no con- 
straint effect. The conditions were like the dim target/blank mask conditions 
discussed above. In that the target was shown briefly against an Illuminated 
background, without being followed by any kind of mas., ^e dependent measure 
was the probability of correctly reporting -he whole word. The results are 
indicated in Table 8. A slight advantage for high blgram frequency items over 
low blgram fVequency was obtained for frequent words, although it was not con- 
sistent over different subsets of items tested, ^e main finding ..s that 
words of low blgram fVequen.y had an advantage ..rung Infrequent words. For 
these stimuli, higher blgram frequency actually resulted In a lower percent 
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Table 8 

Actual and Simulated Results of the 
Broadbent & Gregory (1968) Experiment 
(Probability Correct Whole Report) 

Word Frequency 



High Low 

Actual Data 

■ High BF .6145 ' .431 

Low BF .637 .583 
Simulation 

High BF .414 .212 

Low BF .394 .371 



• 
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t 



correct. 



Unfortunately, Broadbent and Gregory used 5 letter words, so we were 
unable to run a simulation on their actual stimuli. However, we were able to 
select a subset of the stimuli used in the McClelland and Johnston (1977) 
experiment which fit the requirements of the Broadbent and Gregory design. We 
therefore presented these stimuli to our model, under the presentation parame- 
ters used in simulating the blank mask condition of the Johnston and McClel- 
Ta n3 (1973) experiment above"." The onl'y" diiierehc e~vfaB that ttre ovtput was 
taken, not from the letter level, as in all of our other simulations, but 
directly from the word level. The low value of letter to word inhibition was 
used, since with a high value few words ever become activated on the basis of 
partial feature information. The results of the simulation, snown in the 
Table below the actual data, replicate the obtained pattern very nicely. The 
simulation produced a large advantage for the . low bigram items, among the 
infrequent words, and produced a ,s] ight advantage for high bigram frequency 
items among the frequent words. 

• In our model , low frequency words of high bigram frequency are most 
poorly recognized because these are the words which have the largest number of 
neighbors. Under conditions of incomplete feature e>tr.jction, which we expect 
to prevail under these visual conditions, the more neighbors a word has the 
more likely it is to be confused with some other word. This becomes particu- 
larly important for lower frequency words. As we have seen, if both a low 
frequency word and a high frequency word are equally compatible with the 
detected portion of the ^- put. the higher frequency word will tend to dom- 
inate. When incomplete feature information is extracted , the relative activa- 



Sr. 



Interactive Activation Model 
Part I 



McClelland & Rumeliiart 

85 



tion of the target and the neighbors is much lower than when all the features 
have been seen. Indeed ♦ some neighbors may turn out to be just as compatible 
with the features extracted as the target itself. Under these circumstances* 
the word of the highest frequency will tend to gain the upper hand. The pro- 
bability of correctly reporting a low frequency word will therefore be much 
more strongly influenced by the presence of a high frequency neighbor compati- 
ble with the input than the other way around. 

But why does the model actually produce a slight reversal with high fre- 
quency words? Even h^ipe, it would seem that the presence of nu"^'^rous neigh- 
bors would tend to hurt instead of facilitate performance. However, we have 
forgotten the fact that the activation of neighbors can be beneficial t as well 
as harmful. The active neighbors produce feeuoack which strengthens most or 
all of the letters, and these in turn increase the activation of the node for 
the word shown. As it happens, there turns out to be a delicate balance for 
high frequency words between the negaT^ive and positive effects of neighbors, 
which only slightly favors the words with more neighbors. Indeed, the effect 
only holds for some of these item?. We have not yet had the opportunity to 
explore what all the factors are which determine whether the effect of neigh- 
bors will balance out to be positive or negat ve in individual cases. 

Different Effects in Different Experiments 

This discussion of the Broadbent and Gregory experiment indicates once 
again that our model is something of a chameleon. The model produces no 
effect of constraint or bigram frequency under the visual conditions and test- 
ing procedures used in the Johnston (1978) and McClelland and Johnston (1977) 
experiments » but we do obtain such effects under the conditions of the 
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Broadbent and Gregory (1968) experiment. This flexibility of the model, of 
course, is fully required by the data. While there are other models of word 
perception wh^ch can account for one oi- the other type of result, to our 
knowledge, the model presented here is the only scheme that has been worked out . 
to account for both. 

Discussion 

The interactive activation jnodel_does _a good Job accounting for the 
results of the literature we have reviewed on the perception of letters in 
words and nonwords. The model provides a unified account for the results of a 
variety of experiments, and provides a framework in which the effects of both 
physical and psychological manipulations of the characteristics of the experi- 
ments may be accounted for. In addition, as we shall see in Part II, the 
model readily accounts for a variety of additional phenomena of word percep- 
tion. Moreover, as we shall also show, it can be readily extended beyond its 
current domain of applicability with substantial success. In Part II v^c will 
report a. number of experiments demonstrating wh?t we call "Context Enhancement 
Effects," and show how the model can account for tne major findings in the 
experiments . 

However, there are some problems which we have either ignored or faileo 
to solve which remain to be resolved. First, we have ignored the fact that 
there is a high degree of positional uncertainty in reports of letters, par- 
ticularly letters in unrelated strings, but also in reports -f letters in 
words and pseudowords on occasion (Estes, 1975; McClelland, 1976; McClefland & 
John.. ton, 1977). It is not entirely clear whether these uncertainty effects 
arise in the perceptual system itself, in the readout process, or both. It is 
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quite possible that letters are kept well-organized by position in the activa- 
tion system^ but the process of reading them out is not easily restricted to a 
single position channel (cf. Eriksen & Eriksen, 1972). Of course^ it is also 
quite possible that much of the problem arises from positional uncertainty 
within the activation system itself. Although v»v* have not fittempted to model 
these effects in this paper, oun model could easily be modified to account for 



the rearrangements of letters and the fact that they occur more frequently in 
ujirelated Jetter^^^ — Si^fy^se r for " -example; " tti^rc 

the activations of letters were distributions of -ictivation along a spatial 
dimension, instead of points of activation assigned to a particular point in 
an array. Then the activations for letters in adjacent positions would over- 



lap, and if there was noise in the location of the mean of the distribution ot 
activation produced by a letter presented in a particular position, order 
errors would be expected. Under these circumstances, feedback from the word 
level could serve to reinforce that portion of the distribution of activation 
in the correct spatial position, thereby shifting the mean of the distribution 
toward the right position, ^ 

Another thing that we have not -considered very fully is the serial posi- 
tion curve. In gene-^l, it apoears that performance is more accurate on the 
end letters in multi-lei^ter strings, particularly the first letter. The 



effect is much more striking for unrelated letters thdn for pseudowords or 
words (McClelland & Johnston, 1977). While part of this offect may be due to 
reduced lateral masking of end letters and/or to a reduced opportunity for 
order error at the ends of the string, it seems likely that the first position 
advantage reflects some sort of processing priority given to the first letter. 
Some or all of this effect could be accommodated by our model by assuming that 
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the strength of the effect exerted by the letter in a given position is influ- 
enced by the 'tjeployment of attention, and that attention is deployed preferen7 
tlally to uhe first letter position. 

A different possibility that we considered is that part of the serial 
position effect could be due to neighborhood effects. However* these would if 
anything tend to hurt the first letter position relative to other positions 
- "far * ttre ' TQlTowTng^^ The firstTetl:, r isT 'generally speaking, the Tetter 

which has the most enemies. That is, the largest gangs tend to be those con- 
sisting of the last three letters of the item and leaving out the first 
letter. Thus, the word level will tend to produce greater feedback for the 
second, third and fourth letter than for the first. In view of this, we can 
see that one reason for directing attention preuominantly to the first letter 
would be to offset this gang effect. 

There are some effects of set on word perception which we have not con- 
sidered. Johnston and McClelland (1974) found that perception of letters in 
words was actually hurt if subjects focused their attention on a single letter 
position in the word (See also Holender, 1979t and Johnston, 1974). One pos- 
sible, interpretation of these effects would be that they result from the nar- 
rowing of the focus of attention so that visual information from the non- 
target letters is simply not made available to the letter and wore l^v^^ls. 
Another possibility is that the focusing of attention on the contents of a 
single letter position disrupts the process of directing the letter informa-- 
tion into the correct position-specif i-^ channels. It seems likely that either 
of these possibilities could be worked into our model. 
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In all but one of the experiments we have simulatedi the primary (if not 
the only) data for the experiments were obtained from forced choJr<^es between 
pairs of letters, or strings differing by a single letter. In these cases^ it 
seemed to us most natural ""o rely on the output of the letter level as the 
basis for responding. However, it may well be that subjects often base their 
responses on the output of the word level. Indeed, we have assumed that they 
do in experiments like the Broadbent and Gregory (1968) study, in which sub- 
jects were told to report what word they thought they had seen. This may also 
tifjve happened in the McClelland and Johnston (1977) and Johnston (1978) stu- 
dies, in which subjects were instructed to report all four letters before the 
forced choice on some triais. Indeed, both studies found that the probability 
oi' reporting all four letters correctly for letters in words v;as greater than 
we would expect given independent readout of each letter position. It seepis 
natural to account for these completely correct reports by assuming that they 
often occurred on occasions where the subject encoded che item as a word. 
Even in experiments where only a forced choice is obtained, subjects may still 
come away with a word, rather than a sequence of letters on many occasions. 
In the early phases of the development of oui model, we explicitly included 
the possibility of output from the word level as well as the letter level. We 
assumed that the s.ubject would either encode a word, with some orobab^lity 
dependent on the activations at the word level or, failing that, would encode 
some letter fo" e^ch letter petition dependent on the activations the 
letter level. However, we found that simply relying on the latter level per- 
mitted us to account equally well for thj results. In essence, the reason is 
that the word-level information is incorporated into the activations at the 
letter level because of the feedback, so that the word level is largely redun- 
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dant. In addition, of course, readout from the letter level is necessary to 
the model's account of performance with nonwords. Since it is adequate to 
account for all uf the forced-cl oice data, and since it is difficult to know 
exactly how much of the details of free-report data should be attributed to 
perceptual processes and how much to such things as possible biases in the 
readout processes, etc., we have stuck for the present with readout from the 
letter if»v£l. 

Another decision which we adopted in order to keep the model " within 
bounds was to exclude the possibility of processing interactions between the 
visual and phonological syst*;ms. However, in the model as sketched at the 
outset (Figure 1), activations at the letter level interacted with a phonolog- 
ical level as well as the word level. As ve will show in Part II, some of our 
Context Enhancement results with pseudowords are difficult to account for in 
•-.he simplified framework applied in Part I. To accommodate the findings, it 

♦ 

may be appropriate to incorporate interactions between the letter lev'=^l and 
the phoneme level. 

Another simplification we have adopted in Part I has been to consider 
only cases in which individual letters or strings of letters were presented in 
the absence of linguistic context. In Part II we will consider the effects of 
introducing contextual inputs to the word level, and we will explore how t.he 
model might work in processing spoken words in context as well. 

Thus far we have commented in this discussion on the completeness of the 

interactive activation mcxJel to account for the data in the, literature on word 

perception and related domains. But the model is also interesting for reaso/is 

/ 

quite apart from its success in accounting for the data obtained in particular 
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experiments*^ It also illustrates the operation of a kind of mechanism which 
we believe deserves further exploration, not only for word perception but for 
other perceptual domains" and gther aspects of information processing as well* 
Our various simulations show a number of different ways an activation mechan- ' 
ism can be used to process information. It can fill in missing information in 
familiar words* It can act as a sharply tuned filter, focusing activation on 
a single word consistent with all of the information presented. Or it can 
synthesize novel percepts, making 'use of feedback from a number of partially 
relevant partial activations* In P^rt II we will consider a few of the ways 
such a mechanism might be used in such diverse tasks as categorization, memory 
search, and retri.eval. 
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